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Abstract 

Roy’s ‘Safety First’ criterion for selecting one risky asset from many is 
adapted to the case of non-normal returns, via Cornish Fisher expansion. 
The resulting investment objective is consistent with first order stochastic 
dominance, and is equal to the Sharpe ratio for the case of normal returns. 
An investor selecting assets via this objective is not universally attracted 
to positive skew, rather the preference for skew depends on term, the 
expected return and the disastrous rate of return. 


1 Introduction 

Mathematical economic theory posits that agents seek to maximize some utility 
function. [5] In practice, however, real investors can rarely evoke their own 
utility functions. Rather, when selecting from a number of risky assets, investors 
(and quantitative-minded asset managers) often rank their choices based on 
the moments of the returns stream, preferring e.g., higher expected returns 
for a fixed level of volatility, ceterus paribus. Arguably the most commonly 
used measure of investment opportunities is the Sharpe ratio , here defined as 
£ = (p — ro) /cr, where ro is the ‘disastrous’ or ‘risk-free’ rate of return, and p 
and o 2 are the expected value and variance of the returns stream, assumed to 
be known * 1 . 

One objection to the use of the Sharpe ratio as an investment objective is that 
it is generally not consistent with first order stochastic dominance. [7, 16, 20] 
That is, one can construct two random variables, say x and y, such that x 
stochastically dominates y , but the Sharpe ratio of x is lower than that of y. 
Moreover this deficiency cannot be solved by assuming away the p < 0 case 2 

* spavOalumni.emu.edu 

1 It might be more accurate to call £ the signal-noise ratio, and reserve the term Sharpe 
ratio for the analogous quantity constructed from sample estimates. Sharpe himself notes, 
“Since the predictions cannot be obtained in any satisfactory manner, ... ex post values must 
be used-the average rate of return of a portfolio must be substituted for its expected rate of 
return, and the actual standard deviation of its rate of return for its predicted risk.” [15, p. 
122] However, we will follow common usage in calling £ the Sharpe ratio, without much risk 
of confusion. 

2 The Sharpe ratio as an objective ‘prefers’ higher volatility in the case fi < 0, and is thus 
clearly inconsistent with second-order stochastic dominance. It is not clear, however, that the 
sample analogue shares this deficiency. 
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Hodges’ provides the classical counterexample, but such pathological cases are 
easy to construct, as shown in the appendix. 

There have been numerous attempts to generalize the Sharpe ratio to remedy 
these deficiencies, making it suitable for the case of non-normal returns by 
including higher order moments. [7, 16, 20] Hodges assumes an investor with the 
CARA utility function, U ( w ) = —e~ Xw . For an asset with normally distributed 
returns, the optimal amount to invest, long or short, in the asset is 3 p/Xa 2 , in 
the sense of maximizing the expected utility. The maximum expected utility at 
this allocation is —e A"/ , ignoring the time term for simplicity. This leads 
Hodges to define the “Generalized Sharpe ratio” as 

C ff = V-2 log (-£/*). (1) 

where U* is the maximum expected utility under the CARA utility function. 
[7] That is 

[/* = df maxE[-e“ Xra l, (2) 

X L J 

and so 

( g = y^max— 2 log (E [e~ Xxw ]). (3) 

As Hodges’ objective is difficult to compute, Zakamouline and Koekebakker 
carry his analysis to its logical conclusion, using Taylor’s theorem to describe 
the Generalized Sharpe ratio in terms of investor’s relative preferences for higher 
order moments of wealth. [20] They derive an “adjusted for skew Sharpe ratio”, 
defined as 

Cs = C^l + ^yC, (4) 

where 73 is the skewness of the returns distribution, and 63 is the investor’s 
relative preference for third order moments: 

a 3 C/ (fc) (w r ) 

h = where a k = —ttt- —r, 

[A 1 ! ( w r ) 

and U W (w r ) denotes the k th derivative of the investor’s utility function at the 
zero dollar allocation in the risk asset, denoted as w r . For an investor with 
HARA utility, the quantity & 3 is generally positive, and thus the skew adjusted 
Sharpe ratio has positive derivative with respect to skewness (assuming £ > 0). 
In fact, a necessary condition for the investor to demonstrate decreasing risk 
aversion is that 63 > 1, a result due to Pratt. [20, 13] 

Smetters and Zhang carry this line of analysis further, showing that a valid 
ranking of investments must take into account investor’s preferences and cannot 
be a function only of the distributions of returns. [16] Moreover, they develop 
a ranking measure like the Sharpe ratio expressed in terms of the cumulants 
of the returns distribution and the derivatives of the utility. Their Theorem 9 
establishes positive derivative of their objective with respect to odd cumulants 
and negative derivative with respect to even cumulants of the returns distribu¬ 
tion, in accordance with the usual interpretations of ‘temperance’, ‘prudence’, 
‘edginess’, etc. [16, 4] Smetters and Zhang describe how to approximately com¬ 
pute their objective, showing that their third order approximation matches that 
of Zakamouline and Koekebakker. 

3 n.b., this is essentially the Markowitz portfolio on one asset. 


2 







It is only by Stigler’s Law of Eponymy that we know the quantity £ as 
“the Sharpe ratio,” instead of “Roy’s criterion.” [17] Sharpe first described his 
“reward-to-variability ratio” in 1966 as a yardstick for comparing mutual funds, 
but Roy described the same quantity in 1952 as a means of choosing among 
risky assets, under the moniker of “Safety First.” [15, 14, 18] Roy’s justification 
for this objective followed from Chebyshev’s inequality, which states that 

Pr ||a: — n\ > \/fccrj < i. (5) 


For a given ro < p,, let \Jk = (fj, — r o) /<j. Then since Pr |a: — fi < —y/kcrj < 
Pr < |a; — /i| > y/hcr >, we have 


Pr{z < r 0 } = Pr jz - n < - f ^ - ro ) = *' 6 ' ) 

Thus to minimize the probability of a loss (relative to ro), one should maximize 

c 

2 Safety First 

The crux of Roy’s justification for the ‘Safety-First’ objective, which is just the 
signal-noise ratio, is that it bounds the probability of a loss, defined as a return 
less than ro- The argument, based on Chebyshev’s inequality, is only a rough 
upper bound. There are some situations, however, where the signal-noise ratio 
is exactly monotonic in the probability of a loss. For example, if the returns 
are drawn from a scale-location family, like the Gaussian family. Note that the 
central limit theorem tells us that, conditional on finite variance, the sample 
mean of some random variable converges to a normal distribution, and thus 
for the case of log returns, since the mean return is just the total log return 
rescaled, the long term log return is approximately drawn from a scale-location 
family. 

We can maintain the spirit of Roy’s criterion by directly optimizing the 
quantity he sought to maximize, viz. the probability of exceeding ro- To match 
the Sharpe ratio in the case of Gaussian returns, we need only invert the normal 
CDF, resulting in the quantity: 

Ch =df -$ _1 (Pr {x < ro}), (7) 

where $ (•) is the CDF of the normal distribution. When x ~ jV(/x,<t 2 ) , the 
probability that x < ro is $ ((ro — fj.) /a), and so Qh equals the Sharpe ratio, 
(/r — ro) /cr. This objective is legitimately a ‘generalized Sharpe ratio’, since it 
agrees with the Sharpe ratio exactly for normal returns. [20] 

It is trivial to verify that C,h is consistent with first order stochastic domi¬ 
nance, or at least not inconsistent with it 4 . Since if x stochastically dominates 
y, Pr{x < ro} < Pr{y < r 0 } for all r 0 . By monotonicity of <f > -1 (•), Gi is no 

4 This statement is weak, but cannot be strengthened; it must be admitted, for example, 
that for most ro, Oi makes no distinction between the two assets of Hodges’ classic counterex¬ 
ample. 
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smaller for x than y. It should be clear, however, that the converse does not, 
indeed can not, hold: if (h is higher for x than y , for a single ro, it need not 
be the case that x stochastically dominates y. The simple proof is that since 
stochastic dominance does not form a total ordering on probability distribu¬ 
tions, but generalized Roy’s criterion (for one choice of ro) does form a total 
ordering, the latter ordering cannot imply the former. 

Roy’s approximation is based on Chebyshev’s inequality. We can construct 
tighter approximations to the probability of a loss via some classical approxima¬ 
tions to the central limit theorem. Suppose that one will observe n independent 
draws from the returns stream, x. Without loss of generality 5 , let the disas¬ 
trous event be that the observed sample mean return, /i, is less than ro- This 
is equivalent to 

/-fr - d . /-r 0 ~ M 

y/n -< y/n -. 

a <7 

The cumulative distribution function of the quantity on the left hand side can 
be approximated via some truncation of the Edgeworth expansion. [2] 

Define <5 =df y/n(y, — ro) /cr. The Edgeworth expansion is [1, 26.2.48] 


Pr { yf/i^—^ < -5 }> = $ (-5) - </> (<5) 


73 

6y/n 


He 2 ( 5 ) 


+ <M<5) 


24„ ffC3 (<) + 7 ^** (S) 


-m 


(i)+(i) + < s > 


. ( 8 ) 


where < 1 > ( x ) and (f> ( x ) are the cumulative distribution and density functions of 
the standard unit normal, Hei ( x) is the probabilist’s Hermite polynomial [1, 
26.2.31], and 7 j is the standardized i th cumulant, defined as the i th cumulant of 
the distribution divided by a 1 . It happens to be the case that 73 is the skewness, 
and 74 is the excess kurtosis of the distribution. 

Truncating beyond the n -1 / 2 term and applying basic facts of probability 
yields 


Pr{/t > r 0 } 


$(<*) + 


0 r 73 

y/n l 6 


(d 2 - 1) 


(9) 


The implication is that the probability that fi exceeds ro will be increased if S 
is large. Moreover, for a fixed S , the probability that fi exceeds ro is increased 
for large positive skew if <5 2 > 1, but for large negative skew when when <5 2 < 1. 
The implication is that when S 2 is ‘large’ (larger than one unit), one has positive 
preference for skewed returns, otherwise one has negative preference. As long 
as /x > ro, this is asymptotically compatible as n —> 00 with the commonly held 
belief that investors universally value positive skew. 


2.1 Approximating Roy’s criterion 

The generalized Roy’s criterion of Equation 7 is now expressed as 

c ‘ =dr ~7i‘ , ''‘( Pr {^"^ s -' 5 })- <10) 

5 Here we assume the returns are log returns. Then the sample mean is just the rescaled 
total return. By similarly rescaling the disastrous return, we arrive at the formulation here. 
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This implicit definition is a bit unwieldy for use as an objective. One would 
prefer a definition in terms of the cunmlants of the returns stream. Rather than 
use the Taylor series expansion of (x), one can instead use the Cornish 
Fisher expansion of the sample quantile. [9, 8 , 19] 

Let Y = yfn {fi — /r) /cx. This is a random variable with zero mean and 
unit standard deviation. Let 7 * be the i th standardized cumulant of x. The 
i th standardized cumulant of Y is The Cornish Fisher expansion [1, 

26.2.49] finds w in 

Pr{ Y < w} = $(z), 

in terms of z and the higher order cumulants of the distribution. Setting w = —5, 
we have z = — y/nC,h, and the Cornish Fisher expansion reduces to 


Ch = 


(v-r 0 ) . 1 


a 
1 

n 3/2 

1 

+ —o 


Y He2 (v^C h) 


7 ^He 3 (y/nC, h ) - [2i?e 3 (y/nQ t ) + He 1 (y/nCh)] 


73 

3f 

7374 


n 2 L 120 

73 3 


^He 4 {-yfiiC h ) - ^ [He 4 (~y/H( h ) + He 2 (-yfc ,,)] (U) 


+ g_ [l 2 Fe 4 (-V^Ch) + 19 He 2 (-y^G.)] 


+ ... 


While this defines (7 implicitly, truncation gives polynomial equations, whose 
roots can be found analytically or numerically. Noting that derivatives of Her- 
mite polynomials can be easily computed, solving iteratively for Q, via Newton’s 
method should be simple. 

Truncating at two terms gives an equation which is quadratic in Qh , yielding 
the (aesthetically unpleasing) solution: 


73 V n 73 


( 12 ) 


As an example, for garden variety applications in asset management, setting 
C = O.OTday^ 1 ^ 2 , 73 = —1, n = 60day, we have (h ~ 0.0719day -1 ^ 2 . If we 
consider a longer horizon, say n = 252day, one observes (h ~ 0.0698day -1 ^ 2 . 
Thus the difference between Q, and f is modest at the quarter year time scale, 
but negligible at the annual time scale. Note that at the shorter time scale, 
y/nQ < 1 , resulting in a boost to (7 due to negative skew, while at the longer 
time horizon, Ch < ( since y/nC, > 1. 


3 Discussion 

It is not the purpose of this note to suggest that investors should optimize 
(7- Prima facie, the generalized Roy’s criterion appears inconsistent with 
the received wisdom that investors should maximize expected utility, or cor¬ 
responds somehow to decreasing risk aversion 6 . Moreover, since Roy’s criterion 

6 Perhaps Roy’s criterion can be expressed in the classical framework as a Heaviside utility 
function. 
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dichotomizes future returns, it shares some of the hallmark failings of the Value 
at Risk measure, viz. that it does not control for severe tail losses, may not 
promote diversification, etc. [3] Note, however, that Roy was decidely unen- 
thusiastic about the prospect of maximizing expected utility, for pragmatic and 
philosophical reasons, writing, “a man who seeks advice about his actions will 
not be grateful for the suggestion that he maximise expected utility.” [14, p. 
433] 

While we do not have positive proof of investors who do maximize Roy’s 
criterion, we can easily imagine there are some who might. For example, at 
times a professional portfolio manager might try to maximize the probability of 
beating their benchmark over the next month, fearing withdrawals from their 
fund 1 . While investors cannot easily estimate, ex post, what the ex ante ex¬ 
pected return of an investment should have been, they do exhibit a tendency to 
dichotomize their holdings as ‘winners’ or ‘losers’. 

Optimization of Roy’s criterion provides an interesting mechanism by which 
fully informed agents can agree on all moments of returns of an instrument, yet 
rank the instrument differently based entirely on term. The short term investor 
essentially sells (or leases, really) positive skew to the long term investor. It 
is not at all clear, however, that this differential preference for skew drives the 
classical narrative of ‘investors’ versus ‘speculators’; perhaps these two mythical 
groups can be separated by their appetite for kurtosis. 

Finally, as a practical matter, it must be noted that maximization of Roy’s 
criterion is largely a quixotic pursuit. As illustrated in the sample calculation 
above, the difference between £ and tends to be small, much smaller in the 
estimation error around fh- Involving estimates of the higher order moments of 
the returns distribution will only increase that estimation error. [10, 11, 12] 
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A A counterexample 

Let x have mean and variance p > 0 and a 2 , respectively. Let y have the same 
distribution as x, except with probability p > 0 has an additional ‘bonus’ return 
of a constant B > 0. Clearly y (first-order) stochastically dominates x. The 
mean of y is equal to p + pB. The uncentered second moment of y is equal to 
a 2 + p 2 + pB 2 . The Sharpe ratio of y is thus equal to 

p+pB 

\Ja 2 — 2 ppB — p 2 B 2 + pB 2 

Then if, for example, p = 0.001, a = 0.01, p = 10~ 4 , and B = 0.25, the 
Sharpe ratio of x is 0.1, while the Sharpe ratio of y is 0.0995. 

In fact, we can construct a sufficient condition for the Sharpe ratio to be 
reversed in this case. Since p, p and B are assumed positive, 

_ p + pB _ < p 

\Ja 2 — 2 ppB — p 2 B 2 + pB 2 & 

(H+PB) 2 £ 

a 2 — 2 ppB — p 2 B 2 + pB 2 ~ a 2 ’ 

<S=> a 2 (p + pB ) 2 < p 2 ( a 2 - 2 ppB - p 2 B 2 + pB 2 ) , 

<^> ct 2 (2 pBp + p 2 B 2 ) < p 2 (-2 ppB - p 2 B 2 + pB 2 ) , 

<t=> o 2 (2 p + pB) < p 2 (—2 p — pB + B), 

(a 2 + p 2 ) (2 p + pB) < Bp 2 , 

2 p + pB p 2 

———-— < ——- 

B ~ a 2 +p 2 ’ 

p 2 2 p 

a 2 + p 2 B 
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In order for this last inequality to admit a solution with positive p , one must 
have 


B 

"2 


a 

> /r H-. 


For the example above, this ‘minimum’ value of B is 0.202, while the maximum 
acceptable value for p is 0.0019. 
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